Better Than Reference In Low Light Image 
Enhancement: Conditional Re-Enhancement 
Networks 


Yu Zhang, Xiaoguang Di, Bin Zhang, Ruihang Ji, and Chunhui Wang 


Abstract— Low light images suffer from severe noise, low brightness, low contrast, etc. In previous researches, many image 
enhancement methods have been proposed, but few methods can deal with these problems simultaneously. In this paper, to solve 
these problems simultaneously, we propose a low light image enhancement method that can combined with supervised learning and 
previous HSV (Hue, Saturation, Value) or Retinex model based image enhancement methods. First, we analyse the relationship between 
the HSV color space and the Retinex theory, and show that the V channel (V channel in HSV color space, equals the maximum channel 
in RGB color space) of the enhanced image can well represent the contrast and brightness enhancement process. Then, a data-driven 
conditional re-enhancement network (denoted as CRENet) is proposed. The network takes low light images as input and the enhanced 
V channel as condition, then it can re-enhance the contrast and brightness of the low light image and at the same time reduce noise 
and color distortion. It should be noted that during the training process, any paired images with different exposure time can be used for 
training, and there is no need to carefully select the supervised images which will save a lot. In addition, it takes less than 20 ms to 
process a color image with the resolution 400*600 on a 2080Ti GPU. Finally, some comparative experiments are implemented to prove 
the effectiveness of the method. The results show that the method proposed in this paper can significantly improve the quality of the 
enhanced image, and by combining with other image contrast enhancement methods, the final enhancement result can even be better 
than the reference image in contrast and brightness. (Code will be available at https://github.com/hitzhangyu/image-enhancement-with- 


denoise) 


Index Terms—Low Light, Image Enhancement, Denoising, Color correction 


1 INTRODUCTION 


HEN the environment light is low, such as at night 
Wi in a dark room, captured images will be low 
light images. This kind of images always suffer from low 
contrast, low brightness, serious noise, and so on. Low light 
image enhancement method is used to solve these problems 
before high-level tasks. In the past decades, researchers have 
proposed lots of non-learning based image enhancement 
methods, such as [1], [2], [3], etc. Recently, with the de- 
velopment of deep learning, many supervised and unsu- 
pervised learning based image enhancement methods have 
been proposed, such as [4], [5], [6], [7], etc., and achieve 
promising results. However, whether these methods are 
based on learning or not, they have not been able to solve 
all these problems in low light images well simultaneously. 

For non-learning based methods [3], [8], [9], most of 
them can significantly improve the image contrast and 
brightness. However, it is difficult for these methods to re- 
duce or suppress noise directly, and they may even amplify 
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noise or cause color distortion during the enhancement pro- 
cess. The later denoising operation often brings problems 
such as blur, disappearance of details, etc. For unsupervised 
image enhancement works, since it is difficult to introduce 
some prior knowledge to the learning pipeline, especially 
noise and color terms, there is always a problem of noise 
and color distortion [7]. For most supervised works [5], [10], 
there must be a hyperparameter which can be used to con- 
nect the input and reference image during training, and this 
is caused by non-one-to-one correspondence between input 
and reference images. The hyperparameter can be used 
to adjust the contrast and brightness of the whole image 
during test, however, it is hard to obtain the hyperparameter 
automatically, and the brightness and contrast of some local 
areas may not be satisfactory even if the hyperparameter is 
adjusted carefully. 

In this paper, we provide a low-light image enhance- 
ment framework that can integrate unsupervised learning 
or non-learning methods with supervised learning methods 
to solve the low contrast, low brightness, noise and color 
distortion in low light images simultaneously. 

The effectiveness of supervised methods often depends 
on the quality of the data. However, in low-level image 
processing tasks, it is difficult to get input/label pairs like 
high-level tasks, especially in image enhancement tasks. 
Although we can get some images with different illumi- 
nation from the same scene by changing the lighting or 
modifying the camera parameters, we cannot guarantee that 
the reference image has good contrast and brightness on 


Fig. 1. Visual Comparison with Reference. Top Row: low-light images. 
our method combined with Gamma correction. 


each image block (e.g. Fig. 1 (e)-(f)). At the same time, as one 
low light image can correspond to many high light images, 
it is hard to ensure consistency in the selected data (similar 
input image blocks should correspond to similar reference 
image blocks in light). In fact, it can be considered that there 
is no ground-truth image here. Then the problem can be 
summarized as how we can train the networks without the 
unique or the best ground-truth image and how we can con- 
nect the input image and reference image when they do not 
meet the consistency condition during training. Different 
from the way of introducing a single hyperparameter such 
as time ratio [10] into the enhancement process in previous 
supervised works, we propose a new framework which can 
use point-wise parameters related to contrast to achieve 
training and the parameters can be automatically obtained 
through other contrast enhancement methods during test. 
It can be seen in Fig. 1, combining our method with some 
contrast enhancement methods, like gamma correction, we 
can adjust the contrast of the enhanced image and reduce 
the noise simultaneously. And contrast in some local areas 
of the enhanced images are even better than those in the 
reference images, such as bookcase, seats in the dark part of 
stadium. 


As with some previous works [11], we divided the low 
light enhancement problem into two sub-problems, one 
is the contrast and brightness enhancement problem, and 
the other is the denoising and color restoration problem. 
Different from the previous method to solve the two sub- 
problems separately, we use contrast enhancement methods 
to generate point-wise contrast and brightness proposals 
and use them as a condition to re-enhance the low light 


Middle Row: Reference images. Bottom Row: The results enhanced by 


image, and at the same time reduce the noise and color dis- 
tortion. The method in this paper can be combined with any 
image contrast and brightness enhancement method based 
on HSV color space or Retinex model. And the network can 
be trained without the need of carefully selecting reference 
images, and any paired images with different light can be 
used for training. 
Our contributions can be summarized as follows: 


e A conditional re-enhancement network(denoted as 
CRENet) for low light image enhancement is pro- 
posed, which can solve the low contrast, low bright- 
ness, noise and color distortion simultaneously. The 
CRENet can be combined with existing image en- 
hancement method and use the enhanced V channel 
as a condition to achieve re-enhancement of low-light 
images. In this process, the CRENet can maintain 
the contrast and brightness of other enhancement 
method, while reducing noise and color distortion 
at the same time. 

e Compared with other learning-based methods, the 
hyperparameters included in our method are point- 
wise and are directly related to image contrast and 
brightness, which makes it possible that the en- 
hanced image has better contrast than the reference 
image through adjusting the brightness and contrast 
of the local area of the image, and that is hard for 
other learning-based methods with only one hyper- 
parameters. 

e By combining with other contrast and brightness 
enhancement methods, the proposed method does 
not need to carefully select the reference images with 


good exposure, and any paired images with different 
brightness can be used for training. 


(a) Original low-light image 
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(c) Gamma correction & BM3D (d) Gamma correction & our method 


Fig. 2. Visual Comparison with BM3D. (Best viewed on high-resolution 
displays with zoom-in.) 


2 RELATED WORKS 


A.Non-learning based image enhancement methods 

For non-learning based single low light image enhance- 
ment methods, there are mainly histogram equalization, 
methods based on dehazing or Retinex model and other 
improved methods based on those methods. 

Histogram Equalization(HE) is one of the most wildly 
used methods, however, it cannot avoid the problems of 
detail disappearance, poor color restoration, noise ampli- 
fication and so on. Although many improved methods 
have been proposed to solve those problems [8], [9], [12], 
[13], [14], [15], there are still many problems in applying 
histogram equalization directly to image enhancement. 

Dong et al. [16] proposed the method based on dehazing 
model firstly and then some studies extended these works 
[17], although these methods have achieved some good 
effect, they lack corresponding physical model, which limits 
the application of the method in various scenes. 

Some works based on Retinex model are proposed to 
maintain image details and naturalness [3], [18], [19], how- 
ever, the denoising process before or after the enhancement 
process will still cause blur or loss of details. To solve 
the noise and hole effect, and preserve more details, many 
algorithms based on the variational Retinex model are pro- 
posed and achieved good results, such as [2], [20], [21], [22], 
however, most of them will cost too much time due to the 
need of multiple iterations to solve the variational equation. 

Most of the non learning methods focus on contrast and 
brightness enhancement, and then use general denoising 
methods (like BM3D [23]) and white balance to remove 
noise and correct the color, however, those methods always 
bring blur and cannot solve the problem of color distortion. 
As shown in Fig. 2, although Gamma correction improves 
the contrast and brightness of the image, after the denoising 
operation of BM3D, some details disappeared. 

B.Learning based image enhancement methods 
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As we know, LLNET [37] is the first work to use deep 
neural networks to solve image enhancement problem, and 
it proposes to train the networks with synthetic noisy and 
dark images separately, but it does not consider the charac- 
teristics of the natural images. Some other methods also use 
synthetic data sets, such as [24], [25], [26], although the data 
obtained by these methods seems to look like low light im- 
ages, it is difficult to truly reflect some characteristics of low 
light images, such as noise, color distortion, overexposed 
and underexposed areas existed in the same image, etc. 

Chen et al. [10] introduce a dataset which contains real 
raw low light images and corresponding raw high light 
images for training, and they introduce an amplification 
ratio to connect the input and reference images and multiply 
the ratio to a certain layer in the network. The ratio is set 
to be the exposure time difference between the input and 
reference images during training. This method can solve 
the problem of noise and color distortion well, however, 
the ratio must be chosen by user during test which limit 
the widely use of this method, and it can not solve the 
inappropriate contrast problem well. It can be seen that this 
algorithm provides an exposure time adjustment method in- 
stead of physically adjusting the exposure time, it is obvious 
that only by adjusting the exposure time cannot work well 
in the night scene, there will still be some over-enhanced 
(saturated) areas or under-enhanced areas in the image. In 
our previous work [27], we provide a way to automatically 
learn the expected exposure time, but still can not solved 
the problem of contrast. Because we can never provide the 
ground-truth reference images with proper contrast in every 
local area, and without other constraints, it is difficult to 
solve this problem. 

In [4], it is proposed to introduce the Retinex model 
into the training process to connect the reflection images of 
input and reference. However, due to the lack of constraints 
on the noise of the reflection image, additional denoising 
methods(like BM3D [23]) still need to be introduced. The 
method proposed in [5] looks like the combination of [10] 
and [4]. Compared to [4], [5] provides an extra brightness 
ratio to connect the illuminantion images of input and 
reference and adds a subnet called restoration-net to achieve 
denoising. However, during the test, it need to manually 
adjust the ratio parameter to obtain better enhancement 
results. Although these methods use real low light data for 
training, they do not constrain the contrast of the enhanced 
image, which caused the problem of over-enhancement 
(saturation) or under-enhancement in the enhanced image. 

In our previous work [6], we provide a max entropy 
Retinex model to achieve self-supervised learning and at the 
same time constrain the contrast during training. However, 
due to the lack of strong constrains on color and noise, the 
enhanced image still looks like a night one in color and the 
noise cannot be removed well, and during test, we cannot 
guarantee the well contrast on every local area. 

Xiong et al. [7] decompose the low light image enhance- 
ment task into two stages: contrast enhancement and noise 
removal, and propose an unsupervised framework. How- 
ever, in the absence of constraints on the image contrast, 
the contrast and brightness after image enhancement may 
be still unsatisfactory, and in this paper we also prove that 
the Retinex model used in their first stage cannot ensure the 


color information close to the real day image. 

Although these deep learning based methods have 
achieved good visual effects in low light image enhance- 
ment, they still cannot solve the low contrast, low bright- 
ness, noise and color distortion simultaneously. 


3 METHOD 


3.1 Relationship between HSV color space and Retinex 
model 


Recently, a lot of low-light image enhancement works are 
based on the following Retinex model: 


F=Rol (1) 


where F and I represent the captured image and the illumi- 
nation image respectively, R represents the reflection or the 
desired image, o represents the element-wise multiplication. 
Most of the works assume that the three color channels of 
the image have the same illumination in order to simplify 
the model [4], [5], and the maximum value of the three 
color channels is generally used as the initial estimate of 
the illumination map [3]. In the following description, we 
refer to this simplified Retinex model as the Retinex model. 
Through a simple transformation, it can be proven that im- 
age enhancement methods based on this simplified Retinex 
model are equivalent to make enhancement operation on V 
channel in HSV color space and remain the H and S channels 
unchanged. 

The color image can be divided into three channels 
according to the value of each pixel in RGB color space: 


L(x) = F° 

Ca mG 
M = di FC 

a) aan Be) (2) 
N(x)= min F° 

Ga Tuo 


where x represents an individual pixel. Before the process 
of image enhancement, the captured image F in HSV color 
space can be expressed as follows: 


Vi (2) = L(x) 3) 
Sy (2) = “ES (4) 


Hy (2) = c1 +2 6) 
where cı can be 60, 120, 240, co can be +60, their values 
depend on the three color channels of the image, and for 
one image, cı and cz are certain value in each pixel. Vz, 
Se, Hy represent the V, S and H channel of the low light 
image before enhancement, respectively, and the subscript b 
represents before. 

Based on the Retinex model, the desired image R can be 
obtained by the following formula: 


R (x) = F (x) / (I (x) +€) (6) 
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where £ is a very small constant to avoid the zero denomi- 
nator. For simplicity of writing, we omit £ in the following 
formula and directly use I (x) to represent I (x) + €. 

According to Equation (6), the enhanced image R in 
HSV color space can be expressed as follows: 


Va (2) = L (£) /I (2) = V (@)efore I0) O 


s (2) = POA NOO _ 5, 


(8) 


M (2) /E(@) -N (2) /I(2) _ yy (2) 0) 
L (x) I(x) — N (x) /I(z) 

where Va, Sa, Ha represent the V, S and H channel of the 
low light image after enhancement, respectively, and the 
subscript a represents after. 

It can be seen that H and S channels of the captured 
image and the enhanced image are the same, and the 
enhancement operation only works on the V channel, then 
the enhanced V channel can well represent the image en- 
hancement operation for methods based on HSV or Retinex 
model. It is obvious that the hue and saturation are different 
between the image at night and day, also different between 
the low light and high light images. This is caused by the 
non-linearity of camera response curve, and even different 
color channel have different response curve, so methods 
based on the simplified Retinex model cannot ensure that 
the color of the enhanced images looks like a real image 
captured at day or high light. 

Therefore, we provide a supervised image enhancement 
method called the Conditional Re-Enhancement Network 
(CRNet) which takes the enhanced V channel as an addi- 
tional point-wise parameter to allow the network focus on 
learning the H and S channels which change with V channel. 
At the same time, with the enhanced V channel as additional 
parameter, we can benefit from previous researches [3], [6], 
[8] on image contrast and brightness enhancement, and the 
supervised learning also shows well performance in noise 
suppression. In this way, we can simultaneously solve the 
problems of low contrast, low brightness, noise, and color 
distortion in low light images. 

Next, we provide the details of the network and training, 
and further explain the reason and rationality of using the 
V channel instead of a single parameter. 


H, Cz) = C1 + C2 


3.2 Conditional re-eenhancement network 


According to the camera response curve, a pixel on the input 
and reference image can be expressed as follows: 


Pee = Ge (E,At,) 


where X and Y represent the input low light image and 
the reference image respectively, c represents one of the 
color channels, Ge represents the camera response curve 
of c channel, i represents the pixel, At; and At, represent 
the exposure time of low light image and high light image 
respectively, E; represents the irradiance. Obviously, dif- 
ferent reference images may have different exposure times 


RGB values of one pixel under different exposures 


Fig. 3. RGB values of one pixel under different exposures. 


TABLE 1 
Conditional re-enhancement network structure 
Inputs Operator Kernel Output Channels Stride Output Name 
RGB&Enhanced V Conv&ReLU 3x3 32 1 Conv0 
RGB&Enhanced V Conv 9x9 64 1 Conv 
Conv Conv&ReLU 3x3 64 1 Conv1 
Conv1 Conv&ReLU 3x3 128 2} Conv2 
Conv2 Conv&ReLU 3x3 128 1 Conv3 
Conv3 Conv&ReLU 3x3 64 2t Conv4 
Conv4&Conv1 Concat - 128 - Conv5 
Conv5 Conv&ReLU 3x3 64 1 Conv6 
Conv6&Conv0 Concat - 96 - Conv7 
Conv7 Conv 3x3 64 1 Conv8 
Conv8 Conv 3x3 3 1 Conv9 
Conv9 Sigmoid - 3 Enhanced 


for better visual effects, therefore most supervised works 
should include at least one time-related item a in order to 
connect the input and reference image. 

Then, most of the learning based models can be de- 
scribed as follows: 


max p (6| fo, Y,X, a) (11) 


where f and @ represents the enhancement network and 
parameters in the network respectively, and p represents the 
probability. a represents the hyperparameter related to the 
time difference, which is used to connect X with manually 
selected Y, and previous works adopted time ratio a = 
At;,/At,; [10], average brightness ratio a = mean (Y /X) 
[5], etc. As we mentioned before, there are two obvious 
problems in previous works. Firstly, it is difficult to assign a 
value to a automatically during application phase whether 
it is time or average brightness difference, because those 
values are not directly related to image contrast. Secondly, 
a single parameter can only enhance the whole brightness 
of the image, and can not guarantee a well contrast of the 
image on every local area. It can be illustrated through the 
following Equations (12) and (13). 

According to Bayes rule, by calculating the negative log- 
arithm of Equation (11), the training phase can be expressed 
as follows: 


max p (6| fo, Y, X,a(X, Y)) = min [fo (X, a (XK, Y)) — Y]| 


(12) 
If we assume that Y is the optimal reference in a series of 
images with different exposure time and we can manually 
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select a, the best result during testing without other prior 
losses can be expressed as follows: 

where Y represents the enhanced image. It can be seen that 
the best result of the network is hard to exceed the manually 
selected reference Y. In addition, it is obvious that in many 
low light scenes, we can never get the optimal image as 
a ground-truth reference by adjusting the exposure time 
(e.g. Fig. 1 (e)-(f)), which means that we cannot obtain a 
satisfactory image by only adjusting a single parameter a. 

In fact, without other prior, the end-to-end supervised 
method cannot solve the problem of low contrast in low 
light images, or even the problem of low brightness, unless 
we manually adjust the reference images carefully or use 
multi images to synthesize reference images. But this will 
also bring other problems, such as increased time cost, 
the adjusted image can not guarantee the consistency of 
brightness (the same low light image patches correspond to 
different reference images), etc. In order to solve the above 
problems, we propose the CRENéet to explicitly control the 
contrast and brightness of the enhanced image. The CRENet 
takes the V channel enhanced by other image contrast 
enhancement methods as a point-wise condition, and can 
re-enhance the contrast and brightness of low light image 
according to the condition. 

The V channel is the max of the three color channels at 
each pixel, so the V channel of the reference image can be 
expressed with camera response curve as follows: 

Vin = Gy (E Ato (14) 

We make the same reasonable assumption that the func- 
tion G is monotonically increasing [28]. Therefore, for a fixed 
pixel 7, there is a certain At with given Vig, which means 
that V channel is enough to connect other two channels 
between the input and reference during training since the 
three color channels have the same exposure time. As shown 
in Fig. 3, with the exposure changes, the R, G, B value at one 
pixel! can form a curve in 3D space which is related to the 
camera response curve. Our motivation is to let the network 
learn the curve, and then with a given V value, it can obtain 
the values of the other two channels (HSV and RGB are just 
different color representations, so we directly implement it 
in the RGB space when designing and training the network). 
Then, the V channel can be treated as a point-wise parameter 
that we can achieve different levels of enhancement to 
different areas, and through the Equation (2) to (7), it can 
be seen that the V channel can well represent the processing 
results of other contrast enhancement methods. 

During training, we can take the V channel of the ref- 
erence images as point-wise parameter a, then it can be 
expressed as follows: 

Y°+n (15) 


a= max 


cE€{R,G,B} 


1. The data comes from real images taken under 40 different exposure 
conditions. Under each exposure condition, we collected 80 images and 
averaged them to reduce the noise. 
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Fig. 4. The structure of our training pipelines. 


where n represents the Gaussian noise, it is introduced in 
order to avoid the identity transformation and at the same 
time simulate the noise in V channel during testing. 

Meanwhile, we can perform brightness mapping on the 
V channel of low light images according to the reference to 
get a, then a can be expressed as follows: 


Qi =W max (16) 


Xe 
cE{R,G,B} * 
where w is the average brightness ratio which is calculated 
on every local area Q around pixel i, and it can be expressed 
as follows: 


>” Y (2) 


— eEQ(i) 


>” X(x) 


rE (zt) 


(17) 


During testing ,a can be expressed as follows: 


g (X$) (18) 


Q = max 
cE{R,G,B} 


where g represents any contrast and brightness enhance- 
ment methods in HSV color space or based on Retinex 
model, such as histogram equalization, LIME [3] and self- 
supervised methods [6], etc. 

The training procedure is achieved by minimizing the 
loss between the enhanced image and the supervised image, 
and the loss can be expressed as follows: 


L=||¥Y —Y|], + SSIM ae (19) 


where SSIM represents the structural similarity measure- 
ment [29]. We have also test some other loss functions, such 
as perceptual loss, color loss expressed by outer product like 
[7], loss in H and S channels in HSV space, gradient loss 
[5],etc., but the effect of these loss functions on the results is 
not obvious on LOL dataset, neither in visual effects nor in 
quantitative metrics. 

The architecture of our CRENet is shown in Table 1 and 
the training pipeline is shown in Fig. 4. It should be noted 
that our method does not have special requirements for 
the network architecture. The network can be simplified or 
more carefully designed to reduce running time or further 
improve processing effect. We have tried complicated net- 
work, and it is found that a more complex network structure 
may bring better visual effects. However, this is not the 
focus of this paper, so we do not show relevant results in 
the experiment part. 


4 EXPERIMENT 
4.1 Implementation Details 


The LOL (Low Light) dataset [4] which contains 500 image 
pairs is used for training and testing, 485 image pairs of the 
database are used for training and images size is 400x600. 
During the training process, the batch size is set to 48 and 
the patch size is set to 48x48. We use Adam stochastic 
optimization [31] to train the network and the update rate 
is set to 0.001. The training and testing of the network are 
implemented on a Nvidia GTX 2080Ti GPU and Inter Core 
i9-9900K CPU, and the code is based on the tensorflow 
framework. 


4.2 Performance Evaluation 


In this part, we show the results of combining with some 
previous methods, including some methods under the HSV 
color space, such as LAHE (Local Area Histogram Equal- 
ization) and Gamma correction, and some methods based 
on Retinex model, such as LIME [3], self-supervised image 
enhancement [6], Retinex-Net [4], KinD [5], etc. Also we 
compared our method with BM3D in the effect of denois- 
ing. Three metrics are adopted for quantitative compari- 
son, which are PSNR, SSIM, and NIQE [32]. NIQE is a 
non-reference image quality assessment method, which can 
evaluate the naturalness of the image and a lower value 
indicates better quality. While, PSNR and SSIM are reference 
image quality assessment methods, which indicate the noise 
level and the structure similarity between the result and the 
reference, respectively. Please note that SSIM and PSNR are 
mainly used to evaluate structure and noise, but brightness 
differences will have a serious impact on the evaluation 
of these metrics. In order to better evaluate the results 
generated by different methods, we first perform a local 
brightness mapping on the enhanced image to exclude the 
influence of brightness differences before using SSIM and 
PSNR, which is the same as in Equation (16). And the 
mapping operation acts on the V channel and keep the H 
and S channels unchanged. 

Table 2 and Fig. 5 show the experiment results on LOL 
dataset when combined with different image enhancement 
methods. In Table 2, it can be seen that for most of the 
existing methods, our method can significantly reduce the 
noise and improve the structure similarity, and make the 
result more natural. And as shown in Fig. 5, taking images 
enhanced by other methods as condition, the CRENet can 
keep similar contrast and brightness to those enhanced 
images, and at the same time the CRENet can reduce the 
noise and color distortion in those enhanced images. Also, 
it should be noted that the CRENet does not bring obvious 
blur in the process of denoising, and even for the LAHE 
method which produces severe noise, the re-enhancement 
network can work well (e.g. Fig. 5 (b) and (j)). For KinD 
[5] method, our method does not achieve the effect of 
improving the PSNR and SSIM, it is because that there has 
less noise and color distortion on LOL dataset after KinD [5]. 
However, KinD can be treated as a whole image exposure 
time adjustment method, it means that it can not ensure a 
proper contrast or brightness on every local area. As shown 
in Fig. 6, we have tried different ratio, such as 5 and 8, 
and the author suggested that the maximum is 5. It can be 


TABLE 2 
Quantitative comparison on LOL dataset in terms of NIQE, PSNR, and SSIM. Ori shows the results of original method 


Metrics LAHE GAMMA — LIME[3] — RetinexNet [4] KinD [5] — Self-supervised [6] | Zero-DCE [30] 
ori 13.51 10.36 9.13 9.73 3.89 3.72 8.22 
NIQE BM3D 8.50 4.08 4.03 4.61 4.06 4.15 4.10 
ours 3.75 3.68 3.63 3.68 3.60 3.63 3.50 
ori 12.88 21.95 21.06 21.63 25.82 24.37 21.68 
PSNR BM3D 15.35 21.99 22.22 22.41 25.69 24.64 22.65 
ours 24.92 24.83 24.48 24.81 25.43 24.76 25.11 
ori 0.32 0.62 0.60 0.61 0.84 0.75 0.62 
SSIM BM3D 0.41 0.61 0.64 0.64 0.83 0.76 0.64 
ours 0.80 0.80 0.80 0.80 0.83 0.80 0.82 


(p) 


Fig. 5. The results enhanced by some methods and re-enhanced by our method. (a) Original. (b)-(h) are LAHE, Gamma correction, LIME 
[3], RetinexNet [4], KinD [5], Self-supervised [6], Zero-DCE [30], respectively. (i) Reference. (j)-(p) are the re-enhanced results by our method 
corresponding to (b)-(h). 


seen that the lower right corner area of the image cannot be 
enhanced well even we change the ratio. As a matter of fact, 
we even tried the case when the ratio parameter is 100 in 
the experiment, but still cannot get good results. 

Also, we study the influence of different source of a on 
training, the source of which are from the reference image 
with noise, the low light image after mapping, and the mix- 
ture of the two. The results are shown in Table 3 and Fig. 7. It 
can be seen that, when the condition a is from the mixture of 
the low light and high light images, the network show better 
results on the evaluation metrics. In Fig. 7, it can be found 
that when the a comes from the reference, there are still 
some low frequency noise and impulse noise, such as Fig. 7 
(b). However, if the a comes from the low light images with 
brightness mapping operation, there are less noise. This is 
because we only add Gaussian noise to the reference images 
in the process of obtaining a, which cannot simulate the 
real noise in test process very well. In fact, it also shows that 
the real low light image contains far more complex noise 
than Gaussian noise, which means that it is hard to obtain 
promising results in real low light image enhancement tasks 
with only synthetic low light images. Training with the real 
data will also bring some problems in previous works, such 
as the need of adjusting parameters during test and carefully 
selecting reference during training, and enhanced results 
are difficult to exceed the reference, however, the proposed 
method can well solve those problems. 


TABLE 3 
The influence of different a during training on the evaluation metrics 


Reference + Nosie 
Lowlight + Mappping 
Mixture 


3.72, 24.96 0.823 


(c) KinD with ratio 8 


(d) Gamma correction with our method 


Fig. 6. Visual Comparison with single hyperparameter based method. 
(a) Original. (b) KinD [5] with ratio 5.(c) KinD [5] with ratio 8. (d) Gamma 
correction with our method. (Best viewed on high-resolution displays 
with zoom-in.) 


We also tested the consistency of the output results when 
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the input images are the same scene with different bright- 
ness, as shown in Fig. 8. It can be seen that the enhanced 
results basically maintain the consistency of the brightness 
of the same blocks. 


AX 


= i a ee NS 
(a) Original low-light image (b) Reference + Nosie 


; À 4 
{ Y 
PA 
EA aA 


~ (d) Mixture 


Fig. 7. Visual comparison of different source of a during training. (Best 
viewed on high-resolution displays with zoom-in.) 


Fig. 8. Brightness consistency test results. Top Row: low light images 
with different exposure time. Bottom Row: the enhanced results by our 
method combined with HE. (Best viewed on high-resolution displays with 
zoom-in.) 


5 CONCLUSION 


In this paper, we propose a conditional re-enhancement 
network for low-light image enhancement to solve low 
contrast, low light, noise and color distortion simultane- 
ously. The network can be combined with any contrast 
enhancement method which is based on HSV color space or 
simplified Retinex model, and enhanced V channel by other 
methods is treated as conditions to achieve re-enhancement. 
And the experiment results show the effectiveness and ad- 
vantages of the method. Also, there is still some shortcom- 
ings: the final enhancement result depends on the contrast 
enhancement method, and the image saturation is reduced 
in some areas (e.g. Fig. 1 (1)). Future research will focus 
on applying the technology in long-term and night-time 
localization to eliminate the interference of light on feature 
extraction and generate corresponding night-time data sets, 
and using extremely low light images or raw images for 
training, etc. 
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